SQL Server 2008 R2 : Planning for SQL Server Data Replication & SQL Server Replication Types

8/20/2011 11:47:29 AM

Planning for SQL Server Data Replication

You must consider many factors when choosing a method to distribute data. Your business requirements determine which is the right method for you. In general, you need to understand the timing and latency of your data, its independence at each site, and your specific need to filter or partition the data.

Autonomy, Timing, and Latency of Data

Distributed data implementations can be accomplished using a few different facilities in Microsoft: Integration Services (IS), Distributed Transaction Coordinator (DTC), and Data Replication. The trick is to match the right facility to the type of data distribution you need to get done.

In some applications, such as online transaction processing and inventory control systems, data must be synchronized at all times. This requirement, called immediate transactional consistency, was known as tight consistency in previous versions of SQL Server.

SQL Server implements immediate transactional consistency data distribution in the form of two-phase commit processing. A two-phase commit, sometimes known as 2PC, ensures that transactions are committed on all servers, or the transaction is rolled back on all servers. This ensures that all data on all servers is 100% in sync at all times. One of the main drawbacks of immediate transactional consistency is that it requires a high-speed LAN to work. This type of solution might not be feasible for large environments with many servers because occasional network outages can occur. These types of implementations can be built with DTC and IS.

In other applications, such as decision support and report generation systems, 100% data synchronization all the time is not terribly important. This requirement, called latent transactional consistency, was known as loose consistency in previous versions of SQL Server.

Latent transactional consistency is implemented in SQL Server via data replication. Replication allows data to be updated on all servers, but the process is not a simultaneous one. The result is “real-enough-time” data. This is known as latent transactional consistency because a lag exists between the data updated on the main server and the replicated data. In this scenario, if you could stop all data modifications from occurring on all servers, all the servers would eventually have the same data. Unlike the two-phase consistency model, replication works over both LANs and WANs, as well as slow or fast links.

When planning a distributed application, you must consider the effect of one site’s operation on another. This is known as site autonomy. A site with complete autonomy can continue to function without being connected to any other site. A site with no autonomy cannot function without being connected to all other sites. For example, applications that utilize two-phase commits rely on all other sites being able to immediately accept changes sent to them. In the event that any one site is unavailable, no transactions on any server can be committed. In contrast, sites using merge replication can be completely disconnected from all other sites and continue to work effectively, not guaranteeing data consistency. Luckily, some solutions combine both high data consistency and site autonomy.

Methods of Data Distribution

After you have determined the amount of transactional latency and site autonomy needed, based on your business requirements, you need to select the data distribution method that corresponds. Each different type of data distribution has a different amount of site autonomy and latency. With these distributed data systems, you can choose from several methods:

Distributed transactions— Distributed transactions ensure that all sites have the same data at all times. You pay a certain amount of overhead cost to maintain this consistency. (We do not discuss this nondata replication method here.)
Transactional replication with updating subscribers— Users can change data at the local location, and those changes are applied to the source database at the same time. The changes are then eventually replicated to other sites. This type of data distribution combines replication and distributed transactions because data is changed at both the local site and source database.
Peer-to-peer replication— A variation on the Transactional replication with updating subscribers theme is peer-to-peer replication, which is essentially full transactional replication between two (or more) sites, but is publisher-to-publisher (not update subscriber). There is no hierarchy—publisher (parent) and subscriber (child).
Transactional replication— With transactional replication, data is changed only at the source location and is sent out to the subscribers. Because data is changed at only a single location, conflicts cannot occur.
Snapshot replication with updating subscribers— This method is much like transactional replication with updating subscribers; users can change data at the local location, and those changes are applied to the source database at the same time. The entire changed publication is then replicated to all subscribers. This type of replication provides higher autonomy than transactional replication.
Snapshot replication— A complete copy of the publication is sent out to all subscribers. This includes both changed and unchanged data.
Merge replication— All sites make changes to local data independently and then update the publisher. It is possible for conflicts to occur, but they can be resolved.

SQL Server Replication Types

Microsoft has narrowed the field to three major types of data replication approaches within SQL Server: snapshot, transactional, and merge. Each replication type applies to only a single publication. However, it is possible to have multiple replication types per database.

Snapshot Replication

Snapshot replication makes an image of all the tables in a publication at a single moment in time and then moves that entire image to the subscribers. Little overhead on the server is incurred because snapshot replication does not track data modifications as the other forms of replication do. It is possible, however, for snapshot replication to require large amounts of network bandwidth, especially if the articles being replicated are large. Snapshot replication is the easiest form of replication to set up and is used primarily with smaller tables for which subscribers do not have to perform updates. An example of this might be a phone list that is to be replicated to many subscribers. This phone list is not considered to be critical data, and the frequency of it being refreshed is more than enough to satisfy all its users.

The primary agents used for snapshot replication are the snapshot agent and distribution agent.

The snapshot agent creates files that contain the schema of the publication and the data. The files are temporarily stored in the snapshot folder of the distribution server, and then the distribution jobs are recorded in the distribution database.
The distribution agent is responsible for moving the schema and data from the distributor to the subscribers.

A few other agents are also used; they deal with other needed tasks for replication, such as cleanup of files and history. In snapshot replication, after the snapshot has been delivered to all the subscribers, these agents delete the associated .bcp and .sch files from the distributor’s working directory.

Transactional Replication

Transactional replication is the process of capturing transactions from the transaction log of the published database and applying them to the subscription databases. With SQL Server transactional replication, you can publish all or part of a table, views, or one or more stored procedures as an article. All data updates are then stored in the distribution database and sent and applied to any number of subscribing servers. Obtaining these updates from the publishing database’s transaction log is extremely efficient. No direct reading of tables is required except during initial snapshot, and only the minimal amount of traffic is generated over the network. This has made transactional replication the most often used method.

As data changes are made, they are propagated to the other sites at nearly real-time; you determine the frequency of this propagation. Because changes are usually made only at the publishing server, data conflicts are avoided for the most part. As an example, push subscribers usually receive updates from the publisher in a minute or less, depending on the speed and availability of the network. Subscribers also can be set up for pull subscriptions. This capability is useful for disconnected users who are not connected to the network at all times.

The primary agents used for transactional replication are the snapshot agent, log agent, and distribution agent:

The snapshot agent creates files that contain the schema of the publication and the data. The files are stored in the snapshot folder of the distribution server, and the distribution jobs are recorded in the distribution database.
The log reader agent monitors the transaction log of the database that it is set up to service. Each database published has its own log reader agent set up for replication, and it will copy the transactions from the transaction log of that published database into the distribution database.
The distribution agent is responsible for moving the schema and data from the distributor to the subscribers for the initial synchronization and then moving all the subsequent transactions from the published database to each subscriber as they come in. These transactions are stored in the distribution database for a certain length of time and are eventually purged.

A few other agents deal with the other housekeeping issues surrounding data replication, such as schema files cleanup, history cleanup, and transaction cleanup.

Merge Replication

Merge replication involves getting the publisher and all subscribers initialized and then allowing data to be changed at all sites involved in the merge replication at the publisher and at all subscribers. All these changes to the data are subsequently merged at certain intervals so that, again, all copies of the database have identical data.

Occasionally, data conflicts have to be resolved. The publisher does not always win in a conflict resolution. Instead, the winner is determined by whatever criteria you establish.

The primary agents used for merge replication are the snapshot agent and merge agent:

The snapshot agent creates files that contain the schema of the publication and the data. The files are stored in the snapshot folder of the distribution server, and the distribution jobs are recorded in the distribution database. This is essentially the same behavior as with all other types of replication methods.
The merge agent takes the initial snapshot and applies it to all the subscribers. It then reconciles all changes made on all the servers, based on the rules you configure.

Preparing for Merge Replication

When you set up a table for merge replication, SQL Server performs three schema changes to the database. First, it must either identify or create a unique column for each row that will be replicated. This column is used to identify the different rows across all the different copies of the table. If the table already contains a column with the ROWGUIDCOL property, SQL Server automatically uses that column for the row identifier. If not, SQL Server adds a column called rowguid to the table. SQL Server also places an index on this rowguid column.

Next, SQL Server adds triggers to the table to track changes that occur to the data in the table and record them in the merge system tables. The triggers can track changes at either the row or column level, depending on how you set it up. SQL Server supports multiple triggers of the same type on a table, so merge triggers do not interfere with user-defined triggers on the table.

Finally, SQL Server adds new system tables to the database that contains the replicated tables. The MSMerge_contents and MSMerge_tombstone tables track the updates, inserts, and deletes. These tables rely on rowguid to track which rows have actually been changed.

The merge agent is responsible for moving changed data from the site where it was changed to all other sites in the replication scenario. When a row is updated, the triggers added by SQL Server fire off and update the new system tables, setting the generation column equal to 0 for the corresponding rowguid. When the merge agent runs, it collects the data from the rows where the generation column is 0 and then resets the generation values to values higher than the previous generation numbers. This allows the merge agent to look for data that has already been shared with other sites without having to look through all the data. The merge agent then sends the changed data to the other sites.

When the data reaches the other sites, the data is merged with existing data according to rules you have defined. These rules are flexible and highly extensible. The merge agent evaluates existing and new data and resolves conflicts based on priorities or which data was changed first. Another available option is that you can create custom resolution strategies using the Component Object Model (COM) and custom stored procedures. After conflicts have been handled, synchronization occurs to ensure that all sites have the same data.

The merge agent identifies conflicts using the MSMerge_contents table. In this table, a column called lineage is used to track the history of changes to a row. The agent updates the lineage value whenever a user makes changes to the data in a row. The entry into this column is a combination of a site identifier and the last version of the row created at the site. As the merge agent is merging all the changes that have occurred, it examines each site’s information to see whether a conflict has occurred. If a conflict has occurred, the agent initiates conflict resolution based on the criteria mentioned earlier.